Analysis report: New SAD for soybean rust

A New Standard Area Diagram of Soybean Rust Improves Accuracy of Visual Severity Estimates and Optimizes Resource Use.

Emerson Del Ponte https://github.com/jjallaire (UFV)https://www.rstudio.com , Rich Iannone https://github.com/rich-iannone (RStudio)https://www.rstudio.com
2020-01-20

This is a commented report for the R codes used to analyse data of the paper entitled “A New Standard Area Diagram of Soybean Rust Improves Accuracy of Visual Severity Estimates and Optimizes Resource Use”.

R packages

Let’s load a bunch of packages that will be nused in the analysis.

Data preparation

The data was organized in a binary xlsx file with four spreadsheets, each contaning data from a single experiment - each consisted of severity assessment on 50 leaflets by a group of 20 raters with no previous experience in visual severity assessment of soybean rust.

We will prepare a single csv file after we import and combine data from all spreadsheets. We will use read_excel function of the readxl package. The here function is used to indicate the folder in which the xlsx files are stored. It is a very handy function for reproducibility purposes. We finally use the write_csv function to store the data in a non-binary (text) file. We should first load to the tidyverse package which is a collection of packages we will use here.

Data import and reshape

Now that we have the csv file, let’s import it and create a data frame called sad_wide. We use the suffix `wide" because the responses (estimates) are not in a single but in multiple columns, each representing a rater.


# A tibble: 200 x 23
   method  leaf actual   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`
   <chr>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 old_n…     1   0.25     5     1     1     1   0.2   0.1   0.2   0.5
 2 old_n…     2   2.5     15     5     6     7   5    10     7     3  
 3 old_n…     3   7.24    10     8     7    13  30     0.8   8     7  
 4 old_n…     4   7.31    20     3     5    15   5     0.3  15     3  
 5 old_n…     5   9.07    15    17    13    20  17    20    18    10  
 6 old_n…     6  11.6     15    15    12    16  15    25    12     5  
 7 old_n…     7  12.5     25    10    10    12  10    10     9     8  
 8 old_n…     8  13.1     20    10    20    12  15     5    21     8  
 9 old_n…     9  14.6     20    15    18    18  15    14    22     7  
10 old_n…    10  16.1     25    15    22    19  21    30    29    12  
# … with 190 more rows, and 12 more variables: `9` <dbl>, `10` <dbl>,
#   `11` <dbl>, `12` <dbl>, `13` <dbl>, `14` <dbl>, `15` <dbl>,
#   `16` <dbl>, `17` <dbl>, `18` <dbl>, `19` <dbl>, `20` <dbl>

We need to reshape this data frame to the tidy format, where all responses are in a single column which facilitates further work within the tidyverse. For this, we use the gather function and create the rater and the estimate variables to accomodate data from the multiple columns. We should indicate the columns to gather, which are 4 to 23 and this way the first three columns are kept as we want: method, leaf and actual severity.


# A tibble: 4,000 x 5
   method     leaf actual rater estimate
   <chr>     <dbl>  <dbl> <dbl>    <dbl>
 1 old_noaid     1   0.25     1        5
 2 old_noaid     2   2.5      1       15
 3 old_noaid     3   7.24     1       10
 4 old_noaid     4   7.31     1       20
 5 old_noaid     5   9.07     1       15
 6 old_noaid     6  11.6      1       15
 7 old_noaid     7  12.5      1       25
 8 old_noaid     8  13.1      1       20
 9 old_noaid     9  14.6      1       20
10 old_noaid    10  16.1      1       25
# … with 3,990 more rows

Exploratory analysis

Boxplots

Let’s produce boxplots for the error of the estimates for each assessment (n = 20 raters * 50 estimates). It is interesting to see that the two unaided assessment are quite similar in terms of distribution of the errors, although using a different set of random raters. The general trend is to overestimate severity. For the assessments, the median of the error was below zero for the old SAD and around zero for the New SAD.

Density plots of error

Density plots are also an interesting way to visualize the distribution of the error of the estimates. We confirm visually that density of the two unaided estimates are quite similar.

Let’s have a look at the estimates and the error of the estimates for each of the fifty leaves. We add a black line for the actual values and points and smoothed lines colored by each of the four assessments.

First the absolue value.

Error of the estimates

Now we can dig more into the data and plot by rater. It is important to note the the raters are not the same for the new and old SAD, and so the assessments are independent.

And now for the error of the estimates.

In the plots above we showed the error of the estimates by leaf, and although we know that severity was incremental, we had no information on this values. Hence, we can make another plot with severity on the x axis and identify ranges of actual severity with higher errors and compare the two SADs.

Let’s produce the same scatter and error plots but by assessment and rater

Unaided old

Scatter plot

Error plot

Unaided

Scatter plot

Error plot

Aided old

Scatter plot

Error plot

Aided new

Scatter plot

Error plot

Lin’s concordance

Unaided new

Unaided old

Aided old

Aided new

Combine all estimates in a single dataframe and reshape to long format

Visualize CCC statistics

Lin’s concordance coefficient

Correlation coefficient

Linear mixed model

We will estimate the means and compare them using lsmeans package.

Reshape data to wide

Concordance coefficient


 method    lsmean     SE   df lower.CL upper.CL .group
 old_noaid  0.802 0.0134 71.7    0.775    0.828  1    
 new_noaid  0.818 0.0137 73.1    0.791    0.846  1    
 old_aid    0.836 0.0137 73.1    0.809    0.864  1    
 new_aid    0.964 0.0134 71.7    0.937    0.991   2   

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 4 estimates 
significance level used: alpha = 0.05 

Correlation coefficient


 method    lsmean     SE   df lower.CL upper.CL .group
 old_noaid  0.723 0.0202 78.0    0.683    0.764  1    
 new_noaid  0.763 0.0207 78.7    0.722    0.804  1    
 old_aid    0.777 0.0207 78.7    0.736    0.818  1    
 new_aid    0.961 0.0202 78.0    0.920    1.001   2   

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 4 estimates 
significance level used: alpha = 0.05 

Bias coefficient


 method    lsmean     SE   df lower.CL upper.CL .group
 old_noaid  0.895 0.0115 82.2    0.872    0.918  1    
 old_aid    0.927 0.0119 82.2    0.903    0.951  1    
 new_noaid  0.931 0.0119 82.2    0.907    0.954  1    
 new_aid    0.996 0.0115 82.2    0.973    1.019   2   

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 4 estimates 
significance level used: alpha = 0.05 

location-shift


 method      lsmean     SE   df lower.CL upper.CL .group
 old_aid   -0.26019 0.0670 73.0  -0.3937   -0.127  1    
 new_aid    0.00433 0.0654 71.6  -0.1260    0.135   2   
 new_noaid  0.16075 0.0670 73.0   0.0273    0.294   2   
 old_noaid  0.20987 0.0654 71.6   0.0795    0.340   2   

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 4 estimates 
significance level used: alpha = 0.05 

scale-shift


 method    lsmean     SE   df lower.CL upper.CL .group
 old_aid     0.98 0.0235 80.5    0.933     1.03  1    
 new_aid     1.00 0.0229 80.2    0.955     1.05  1    
 new_noaid   1.22 0.0235 80.5    1.171     1.26   2   
 old_noaid   1.23 0.0229 80.2    1.186     1.28   2   

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 4 estimates 
significance level used: alpha = 0.05 

Combine in a table


      method lsmean         SE       df    lower.CL   upper.CL .group
1  old_noaid   0.80 0.01336283 71.71593  0.77505017  0.8283304     1 
2  new_noaid   0.82 0.01368490 73.10574  0.79103026  0.8455769     1 
3    old_aid   0.84 0.01368490 73.10574  0.80896610  0.8635127     1 
4    new_aid   0.96 0.01336283 71.71593  0.93742162  0.9907019      2
5  old_noaid   0.72 0.02018866 78.02513  0.68314130  0.7635260     1 
6  new_noaid   0.76 0.02071126 78.69466  0.72183593  0.8042904     1 
7    old_aid   0.78 0.02071126 78.69466  0.73580708  0.8182615     1 
8    new_aid   0.96 0.02018866 78.02513  0.92048520  1.0008699      2
9  old_noaid   0.90 0.01154918 82.21618  0.87221825  0.9181664     1 
10   old_aid   0.93 0.01186563 82.21619  0.90345285  0.9506600     1 
11 new_noaid   0.93 0.01186563 82.21619  0.90696996  0.9541771     1 
12   new_aid   1.00 0.01154918 82.21618  0.97346732  1.0194155      2
13   old_aid   0.98 0.02354788 80.52846  0.93333244  1.0270465     1 
14   new_aid   1.00 0.02293871 80.17793  0.95476558  1.0460615     1 
15 new_noaid   1.22 0.02354788 80.52846  1.17124116  1.2649553      2
16 old_noaid   1.23 0.02293871 80.17793  1.18573173  1.2770276      2
17   old_aid  -0.26 0.06697058 72.97180 -0.39366578 -0.1267197     1 
18   new_aid   0.00 0.06539710 71.56846 -0.12604678  0.1347136      2
19 new_noaid   0.16 0.06697058 72.97180  0.02727531  0.2942214      2
20 old_noaid   0.21 0.06539710 71.56846  0.07949339  0.3402537      2

Interrater reliability

Two methods were used here. The overall concordance coefficient and the intra-class correlation coefficient.

No aid old


[1] 0.7465934

[1] 0.8375556

No aid new


[1] 0.7360109

[1] 0.8127776

Aid old


[1] 0.7597936

[1] 0.8303938

Aid new


[1] 0.9409826

[1] 0.9464353

summary table

Method OCCC ICC ICC_l ICC_u
old_noaid 0.7465934 0.8375556 0.7796401 0.8901147
old_aid 0.7360109 0.8127776 0.7485236 0.8722027
new_noaid 0.7597936 0.8303938 0.7705829 0.8849706
new_aid 0.9409826 0.9464353 0.9240251 0.9651455